Goto

Collaborating Authors

 human data


Ful with Natural

Neural Information Processing Systems

Extensive experimental results demonstrate that GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods. Notably, GeneMAN could reveal much better generalizability in dealing with in-the-wild images, often yielding high-quality 3D human models in natural poses with common items, regardless of the body proportions in the input images.


EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data

Neural Information Processing Systems

Egocentric human experience data presents a vast resource for scaling up endto-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified cotraining framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely. Videos and additional information can be found at https://ego-bridge.github.io/


GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

Neural Information Processing Systems

Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. Existing methods face difficulties including a) the varying body proportions captured by in-the-wild human images; b) diverse personal belongings within the shot; and c) ambiguities in human postures and inconsistency in human textures.


Spatial frequency channels, shape bias, and adversarial robustness--Supplementary material -- AHuman psychophysics

Neural Information Processing Systems

Figure 1 shows screenshots from our online psychophysical critical band masking experiment. Accuracy heatmaps computed for different observers in our experiment showed little individual difference (Figure 1) and an even smaller difference in terms of threshold noise SD for 50% accuracy. Table 1 shows the value of each channel property computed from Gaussian fits to the averaged human data versus those found by summarizing Gaussian fits to individual human data. Given that they are similar for all channel properties, we use the former for all reported human data in the main paper. Our existing method for computing thresholds and fitting the Gaussian function to them is difficult to apply to observers that have very high noise sensitivity (low efficiency) since it relies on good performance for the baseline (zero-noise) condition.







Collaborating with Humans without Human Data

Neural Information Processing Systems

Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train human-aware agents (behavioral cloning play, or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data.